Study Question 3: Are concentrations above or below a criterion?

C.3 Study Question 3: Are concentrations above or below a criterion?

To ensure a valid test, it is important to understand how the criterionGeneral term used in this document to identify a groundwater concentration that is relevant to a project; used instead of designations such as Groundwater Protection Standard, clean-up standard, or clean-up level. used for comparison was derived and based on that understanding, to have a well-defined null hypothesisOne of two mutually exclusive statements about the population from which a sample is taken, and is the initial and favored statement, H₀, in hypothesis testing (Unified Guidance).. The criterion can be an MCL, a risk-based value or fixed backgroundNatural or baseline groundwater quality at a site that can be characterized by upgradient, historical, or sometimes cross-gradient water quality (Unified Guidance). limit and may represent a single regulatory value, the meanThe arithmetic average of a sample set that estimates the middle of a statistical distribution (Unified Guidance). of a population, or a percentile; therefore, when defining the null hypothesis (and ultimately the comparison method), it is important to take this into account to ensure selection of a test that reflects the intent of the criterion. For example, if the criterion is a not-to-exceed value, individual sample results can be compared to it in much the same manner as is done with prediction limitsIntervals constructed to contain the next few sample values or statistics within a known probability (Unified Guidance).; alternatively, if the criterion is derived to represent an average concentration ceiling, an upper confidence limit around the mean of the compliance data is the appropriate test statistic.

This question is relevant during release detection, site characterization, monitoring, and closure stages of the project life cycle.

Selecting and Characterizing the Data Set

Examine the site data set to determine if you are going to compare either intrawellComparison of measurements over time at one monitoring well (Unified Guidance). or interwellComparisons between two monitoring wells separated spatially (Unified Guidance). data to a criterion (see Section 3.6.5). Intrawell comparisons are most common. If interwell data are going to be used, ensure that the sample data share the same hydrogeologic and geochemical characteristics before combining these data, and test for significant spatial variabilitySpatial variability exists when the distribution or pattern of concentration measurements changes from well location to well location (most typically in the form of differing mean concentrations). Such variation may be natural or synthetic, depending on whether it is caused by natural or artificial factors (Unified Guidance).. In either case, examine the site data to determine what distributional assumption should inform selection of statistical tests (see Section 4.3.1: Physical Site Conditions and Section 4.2.1: Background Conditions. Refer to Section 3.4: Common Statistical Assumptions for further discussion concerning how the following requirements may impact statistical analysis results.

Use box plots, probability plots, Dixon's test, or Rosner's test to check for outliersValues unusually discrepant from the rest of a series of observations (Unified Guidance)..
Check that mean and varianceThe square of the standard deviation (EPA 1989); a measure of how far numbers are separated in a data set. A small variance indicates that numbers in the dataset are clustered close to the mean. are stable over the time frame (time series plot).
No autocorrelationCorrelation of values of a single variable data set over successive time intervals (Unified Guidance). The degree of statistical correlation either (1) between observations when considered as a series collected over time from a fixed sampling point (temporal autocorrelation) or (2) within a collection of sampling points when considered as a function of distance between distinct locations (spatial autocorrelation). should exist between successive sampling events.
Check that no significant trends exist (time series plot). If the data set exhibits significant trends, it may be appropriate to select a subset of the data to representing current concentrations.
Determine distribution of the data (for example, normal, lognormalA dataset that is not normally distributed (symmetric bell-shaped curve) but that can be transformed using a natural logarithm so that the data set can be evaluated using a normal-theory test (Unified Guidance).) (skewness coefficient, Shapiro-Wilk test, censored probability plots).
Estimate the mean and standard deviation of left-censored sample using Kaplan-Meier when 50% or less of the data set is nondetect.
See also Section 4.1: Considerations for Statistical Analysis.

Statistical Methods and Tools

There are two broad approaches for analyzing well data and answering the question as to whether chemical concentrations are above a criterion. These two approaches are comparison of pooledGroundwater samples from more than one sampling point. interwell compliance data to the criterion and comparison of intrawell compliance data to the criterion.

The statistical tests most commonly used are confidence intervals or limits, tolerance limitsThe upper or lower limit of a tolerance interval (Unified Guidance)., prediction limits and one sample t-testA t-test, or two-sample test, is a statistical comparison between two sets of data to determine if they are statistically different at a specified level of significance (Unified Guidance).. Confidence intervals are constructed around a statistic of interest (for example, mean, medianThe 50th percentile of an ordered set of samples (Unified Guidance)., certain percentile) while prediction and tolerance limits are extreme values beyond which only represent a small portion of the data population. The one sample t-test compares a statistic of interest from the data to a criterion based on the same statistic of interest derived from the background. Site-specific considerations or regulatory requirements usually determine which parameters and tests are appropriate.

Limits are most often used to compare sampling data to a fixed criterion. There are two questions that can be asked. One question is whether the groundwater concentration of a specific chemical has exceeded a criterion, while the other question is whether the groundwater concentration of a particular chemical has fallen below a criterion. In determining if a criterion has been exceeded, the lower confidence limit is of primary interest. But the upper confidence limit, tolerance limit, or prediction limit are most important in determining if the concentration has fallen below a criterion.

As an example of limit selection, if the criterion being used is a health-based concentration, and the mean exposure should not exceed the criterion, then select a predetermined confidence that the upper confidence limit on the mean (UCL) is below the standard. Likewise, if you are examining groundwater data which has historically been above a criterion, you want the UCL to be below the standard. The scenario is different when assuming that the well being monitored is not contaminated. In this case, retain the assumption until the lower confidence limit is above the criterion.

If the fixed criterion is an average concentration, the appropriate statistical parameter to compare to is the mean or median concentration from site data by use of either a confidence intervalStatistical interval designed to bound the true value of a population parameter such as the mean or an upper percentile (Unified Guidance). or a one-sided t-test.

Parametric Confidence Intervals

Confidence intervals can be calculated for normal, lognormal or nonparametricStatistical test that does not depend on knowledge of the distribution of the sampled population (Unified Guidance). distributions (see next section) using the methods below:

confidence interval around a mean (see Section 5.2.2., Section 5.2.3, and Section 5.2.4).
confidence interval around an upper percentile (see Section 5.2.5).
robust confidence interval around a mean to modify the nonrobust calculations so that outlying observations in a data-set can be accommodated. (USEPA 1999).

Nonparametric Confidence Interval

Nonparametric confidence intervals can be calculated for non-normal data and data which cannot reasonably be transformed so as to become normally distributed. They can also be used when the data set contains a high number of nondetects. Use of nonparametric confidence intervals in determining if a criterion has been exceeded is similar to the parametric confidence interval. As with parametric confidence intervals the assumption is that like parameters are being compared, for example, median to median. When data are ranked using nonparametric methods, it is relatively simple to estimate percentiles in which the data fall; but it is more difficult to estimate parameters such a mean and variance. Thus, nonparametric confidence intervals are built around medians or 50th percentile as opposed to means.

Tolerance Limits

When a fixed criterion is an upper percentile or maximum, and no more than a small specified fraction of the individual concentration measurements should exceed the limit, a tolerance limit is a possible appropriate statistic. As with confidence limits, a tolerance limit is one side of a tolerance interval. Tolerance limits, as with confidence limits, may be calculated based on either parametric or nonparametric assumptions.

Using the tolerance limit for testing, you can state that, “I’m 95 percent confident that a particular tolerance interval brackets some percentage, say 99 percent, of the population.” Similarly, for the upper tolerance limit (UTL) you could say that “I’m 95 percent confident that 99 percent of all data will be less than the UTL." Note that this statement is independent of the specific number of future samples, and this is what contrasts tolerance limits with predictions limits.

It may be useful to also note that there is no difference between a 95 percent confidence on the upper 95th percentile and an upper tolerance limit on the 95th percentile at 95% confidence.

Prediction Limits

Prediction limits (PLs) estimate an interval in which future observations will fall, with a defined probability, given the collected data. The calculation of PLs takes into consideration the number of future data to be compared, as well as the number of retests required to confirm a release.

As the number of chemicals increase, and the number of resampling instances increases, the upper prediction limit also increases. A corresponding decrease will occur in the powerSee "statistical power." of the test (the probability of missing a true exceedance). To reduce this source of error, limit the number of chemicals examined.

One sample t-test

The one sample t-test compares a statistic of interest (generally the mean) from the data to a criterion representing the same statistic from the background population. The test can be used on either interwell data or intrawell data. It is a parametric test.

Interpretation of Results and Associated Uncertainty

In selecting the statistical method, understand what the groundwater criterion represents and the consequences of exceeding that criterion. The statistical methods selected and interpretation of their results may vary depending the null hypothesis selected (for example, site data are above the criterion or site data are below the criterion). When using a risk-based criterion or background, typically the UCL of the mean or median concentration is compared to the criterion.

Closure determination is supported only when the entire confidence interval (UCL) is below the criterion. Small sample size can result in a wide confidence interval, such that the interval is not useful in identifying a difference. In such cases, additional samples will need to be collected to increase sample size to narrow the interval. Chapter 21 and Chapter 22 of the Unified Guidance provide additional information regarding use of confidence intervals in monitoring for compliance and closure.